Modeling Speed-Accuracy Tradeoff in Adaptive System for 

Practicing Estimation 


Juraj Niznan 
Masaryk University Brno 
niznan@nnail.muni.cz 


ABSTRACT 

Estimation is useful in situations where an exact answer is 
not as important as a quick answer that is good enough. A 
web-based adaptive system for practicing estimates is cur- 
rently being developed. We propose a simple model for es- 
timating student’s latent skill of estimation. This model 
combines a continuous measure of correctness and response- 
times. The advantage of the model is its simple update 
method which makes it directly applicable in the developed 
adaptive system. 

1. INTRODUCTION 

Estimation is a very useful skill to possess. Particularly in 
situations where an exact answer is not as important as be- 
ing able to quickly come up with an answer that is good 
enough (e.g., total amount on a bill in a restaurant, number 
of people in a room, total of the coins in a wallet, num- 
ber of cans of paint needed for painting a room, converting 
between metric and imperial units). It was shown that es- 
timation ability correlates with the ability to solve compu- 
tational problems [2, 9, 8]. Because estimation is so useful, 
we have decided to develop a computerized adaptive system 
that will let its users practice estimating by solving various 
tasks. 

The adaptive system will include exercises for practicing nu- 
merical estimation (results of basic arithmetic operations, 
converting between imperial and metric units, converting 
between temperature units, currencies and exchange rates) 
and visual estimation (counting the number of objects in a 
scene). 

In order to provide adaptive behavior of the system, we need 
a way of inferring student’s ability of estimation. In our 
setting, the binary-valued correctness-based modeling ap- 
proach is not suitable. We do not expect the users to input 
exact responses, we expect them to input their best esti- 
mates. So our model should work with some measure of the 
quality of an answer. Another important point is the speed- 


accuracy tradeoff. Figure lA shows a hypothetical tradeoff 
curve for one user with fixed ability. User can answer a task 
very quickly but it will probably be a very rough estimate. 
Or he/she can decide to spend more time on the task and re- 
spond with a more precise answer. Therefore, response-time 
should be a vital part of our model. 

The system should be able to detect prior skill (i.e., how 
good the user was at estimation before he started using the 
system) which can be deduced from the first interactions of 
the user with the system. The goal of the developed system 
is to enable the user to get better at estimating. Therefore, 
the proposed model should also take into account user’s im- 
provement (or learning) over time. Figure IB illustrates 
answers of several users on one task as red dots. Ideally, the 
system will help its users to learn to perform near the green 
mark, to be fast and accurate. 

A B 



Figure 1: A) hypothetical speed-accuracy tradeoff 
curve, B) goal of the system 

The value of the system will also be in the data that will be 
collected. It can be used to answer some interesting research 
questions. Does the speed-accuracy tradeoff curve have the 
same shape for converting between EUR and USD as for 
estimating the number of displayed objects? How do the 
learning curves look? Can estimation tasks in one area be 
learned more quickly than in another area? How close to 
the perfect mark can users push their performance? What is 
the influence of a countdown timer on user’s performance? 
What is the appropriate level of challenge that motivates 
the users? The last question was addressed in [3], where the 
authors were trying to validate the Inverted-U Hypothesis 
(i.e., we most enjoy challenges that are neither too easy, 
neither too hard) on data collected from online estimation 
game called Battleship Numberline. They found out that the 
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easier the game was, the longer users played the game. 

2. MODELS 

In this section, we present a few existing models for combin- 
ing correctness and response-times in Item Response The- 
ory (IRT) and a model for tracking learning currently used 
in our other adaptive practice system. We then propose a 
simple model that could be used in the system for practic- 
ing estimates. The described models use a logistic function 
cr(z) = (1 -h Users of the system (or students) are 

indexed by j. The items (or tasks, problems, questions) that 
the users solve are indexed by i. 

2.1 Models from IRT 

A typical example of an approach to the modeling of both 
correctness and response-times in Item Response Theory is 
from van der Linden [10]. The approach uses two models, 
one for correctness (binary) and the other one for response- 
times (distributed lognormally ) . The probability of success 
of a student j on item i can expressed by the 3PL model: 

Pij = Ci + (1 - Ci) ■ a{ai{9j - bi)) 

where parameter dj is the skill of student j and ai,bi, a are 
the discrimination, difficulty and pseudo-guessing parame- 
ters for the item i. The logarithm of a response-time tij can 
be predicted by: 

Inty = /3i - Tj (1) 

where 0i represents the amount of labor required to solve 
item i and Tj the speed of student j. The disadvantage 
of this model is that it does not model the speed-accuracy 
tradeoff explicitly. 

An example of a model that directly combines binary cor- 
rectness with response-time is Roskam’s model [7]: 

Pij = <T{dj + Infij - bi) 

Here, an Increase in item difficulty (or decrease in student’s 
ability) can be always compensated by spending more time 
on a problem. This tradeoff is called an increasing condi- 
tional accuracy function. 

2.2 Model for factual knowledge 

Here, we present a model that is currently used in a popular 
adaptive system for practicing geographical facts [4]. This 
model consists of two parts, one (Elo) estimates the prior 
knowledge of a student and the second one (PFAE) models 
student learning. A big advantage of this model is that 
it uses fast online methods of parameter estimation which 
makes it suitable for use in an interactive adaptive practice 
system. 

The prior knowledge of a student is modeled by the Rasch 
(IPL) model. The probability that a student j answers item 
i correctly is modeled by the likelihood p,j = cf{dj — bi). The 
parameters are estimated using Elo rating system [1]. Elo 
was originally developed for rating chess players, but the 
process of student answering an item can be interpreted as 
a "match” between the student and the item. After each 
"match”, the parameters are updated as follows: 

Bj := 6j + U{rij) ■ [correct — pij) 
bi := bi + U[rii) ■ [pij — correct) 


where U[n) is the uncertainty function U[n) = and n 

is the number of updates of the parameter and a and /3 are 
metaparameters. The variable correct takes value 1 if the 
student has answered correctly and value 0 otherwise. This 
model is used for predicting- and trained on-first responses. 

After the first interaction of a student j with item i has 
been observed, we can set student’s skill in that particular 
item to Bij = Bj — bi. An extended version of Performance 
Factors Analysis [5] called PFAE is used to model learning 
and predicting the following interactions of the student with 
the item. Likelihood of a correct answer is pij = (r[Bij). The 
update to student’s knowledge of item Bij after observation 
is: 

j Bij + "f ■ [1 — Pij) if the answer was correct 

• — A 

[Bij -h (5 ■ Pij if the answer was incorrect 

where 7 and S are metaparemeters. The reason for two dif- 
ferent metaparameters is that the student learns also during 
an incorrect response. 


2.3 Proposed model for estimates 

Here, we propose a model that can be used in the adap- 
tive practice system for estimates. The model combines 
Roskam’s model and the update scheme from Elo and PFAE. 

A simple extension of the correctness-based modeling to the 
setting of practicing estimates is to use a measure of cor- 
rectness, or a score - a rational number ranging from 0 to 
1. The way of scoring of an answer could be based on the 
domain being practiced by the user. For example, for the 
scenario where the user is estimating the number of objects 
in a scene, the exact answer would get a score of 1 , deviating 
by one object a score of 0 . 8 , etc. 

The model assumes the same parameters and relationship 
as Roskam’s model, but instead of expressing a probability 
of a correct answer it specihes the expected score: 

s'ij = a[Bj + Intij - bi) 

Figure 2 shows how the score changes as a function of time 
for different values of user’s skill Bj (with Hxed bi = 0). It 
nicely demonstrates the speed-accuracy tradeoff. 



time 


Figure 2: Score function for different values of skill 

After observing score Sij that user j obtained for answering 
item i and response-time tij, we can update model’s beliefs 
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in the parameters: 

^ J “t“ '*)' ■ (Sij ^ij') if ^ij ^ ^ij 

1 “f“ (5 ■ (^Sij ^ij') if ^ij ^ ^ij 

hi ■■= bi + U (rii) ■ {sij - Sij) 

Note, that the model uses a single parameter 6j for the stu- 
dent. This is different from the approach taken in PFAE, 
where the student has a parameter for each item 6ij . While 
that approach is suitable for modeling the knowledge of facts 
- where it is reasonable to assume that the knowledge of one 
fact is independent of the knowledge of another - it is not 
suitable here. Student’s ability to convert 2 miles to kilome- 
ters is surely dependent on his ability to convert 3 miles to 
kilometers. 

We propose using separate model for each concept (e.g., es- 
timating the number of objects, conversion lb to kg, conver- 
sion EUR to USD). It is true that student’s ability to esti- 
mate items corresponding to one concept tells us something 
about his ability to estimate the other concepts. However, 
if the user does not know the conversion rate from EUR to 
USD then being able to estimate well the other concepts will 
not help him. 

The model can be easily extended by adding a discrimination 
parameter a or a guessing parameter c (similarly to the IRT 
model): Sij = c -|- (1 — c) ■ a{a(0j + In ty — 6*)). These added 
parameters could be either metaparameters of the model or 
parameters of the item i. The guessing parameter may be 
useful for the scenario where the user has to select a value 
on a numberline. 

As we mentioned earlier, this model suffers from the issue 
that increasing the time spent on an item increases the ex- 
pected score. This may hold true for the instance where the 
user knows the underlying concept (e.g., the conversion rate 
from EUR to USD) but it does not hold when he does not 
know it. But the model uses the logarithm of response-time 
and the time a student is willing to spend on an item is 
limited. Therefore, the model should have reasonable be- 
havior for the time interval of interest, as is demonstrated 
in Figure 2 by the curve corresponding to 9j = —5. 

3. DISCUSSION 

The model works with the response-time as a parameter. 
Therefore, it cannot be used for predicting response-times 
directly. A model similar to (1) can be used for that. Pre- 
dicted time and score can be used for item selection (i.e., 
which item to offer the user next). This can be done by 
setting a target score and recommending an item with pre- 
dicted score close to the target. 

Does the model perform better than a simple IPL model 
that does not use response-times at all? Does it make sense 
to add more parameters to the model? How does the model 
fare against more complicated models? To be able to an- 
swer these questions, we need to somehow evaluate the per- 
formance of the model. The choice of metric is interesting 
because a model can predict both score and response-time. 
When considering only the predicted score, a standard met- 
ric like RMSE can be used [6]. When we have a measure of 


performance, we can explore if the model is well-calibrated 
with respect to response-times or if the model works simi- 
larly well for all the domains (concepts). 

Other question that we could ask is how well does the speed- 
accuracy tradeoff curve that the model assumes correspond 
to reality. 
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